科研探索

该分类下共有 8 篇文章

返回所有文章 →

Foundational Models 论文阅读合集 1

首次发布: 2026-03-20 | 最后更新:2026-04-10

... 次访问

Explorations & Insights

Foundational Models

14538 个字词

73 分钟

FlowRL - Matching Reward Distributions for LLM Reasoning

首次发布: 2026-02-09

... 次访问

Explorations & Insights

Large Models

Theory

Large language model (LLM) reasoning is typically formulated as a conditional generation problem: given a question \mathbf{x} \in \mathcal{X}, a policy model \pi_{\theta}(\mathbf{y}|\mathbf{x}) generates an answer \mathbf{y} \in \mathcal{Y}. The quality of the answer is evaluated by a task-specific reward signal r(\mathbf{x}, \mathbf{y}). In reasoning tasks, the reward is usually sparse and terminal (e.g., correctness of the final answer), which means we consider one-step reward instead of returns (i.e., discounted sum of rewards over time steps).

1536 个字词

8 分钟

Self-Distillation

首次发布: 2025-12-27

... 次访问

Explorations & Insights

Pretraining Methods

This paper proposes DINO, a self-distillation framework with no labels, to pretrain ViTs. Besides the fact that the DINO method works quite well on this kind of architecture, there are also two interesting properties emerging from the learned features:

1005 个字词

5 分钟

On-Policy Distillation

首次发布: 2025-12-15

... 次访问

Explorations & Insights

Large Models

Theory

Currently, large models are post‑trained via RLHF, making them powerful but expensive to train and deploy, while smaller models are usually fine‑tuned with SFT or KD methods and are easier to deploy and adapt but often lack the performance of larger models.

944 个字词

5 分钟

Fourier and Wavelets for Deep Learning

首次发布: 2025-12-08

... 次访问

Explorations & Insights

Theory

令 f\in L^2(\mathbb{R})。傅里叶变换（在 L^2 意义下）把信号表示为全局正弦基的叠加：

3373 个字词

17 分钟

Do we really need encoders for generative models?

首次发布: 2025-06-27

... 次访问

Explorations & Insights

Theory

In modern generative AI, encoders are commonly used during training to help models understand the context of input data. However, these encoders are often removed during inference. This raises an interesting question, if we train models using only decoders, can they still generate meaningful outputs?

529 个字词

3 分钟

Reverse KL Divergence

首次发布: 2025-06-25

... 次访问

Explorations & Insights

Theory

The KL Divergence (KLD) is defined as

605 个字词

3 分钟

MuJoCo 基本介绍

首次发布: 2025-06-13